22 research outputs found

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    Differential glucocorticoid metabolism in patients with persistent versus resolving inflammatory arthritis

    Get PDF
    Introduction: Impairment in the ability of the inflamed synovium to generate cortisol has been proposed to be a factor in the persistence and severity of inflammatory arthritis. In the inflamed synovium, cortisol is generated from cortisone by the 11β-hydroxysteroid dehydrogenase type 1 (11β-HSD1) enzyme. The objective of this study was to determine the role of endogenous glucocorticoid metabolism in the development of persistent inflammatory arthritis. Methods: Urine samples were collected from patients with early arthritis (symptoms ≤12 weeks duration) whose final diagnostic outcomes were established after clinical follow-up and from patients with established rheumatoid arthritis (RA). All patients were free of disease-modifying anti-rheumatic drugs at the time of sample collection. Systemic measures of glucocorticoid metabolism were assessed in the urine samples by gas chromatography/mass spectrometry. Clinical data including CRP and ESR were also collected at baseline. Results: Systemic measures of 11β-HSD1 activity were significantly higher in patients with early arthritis whose disease went on to persist, and also in the subgroup of patients with persistent disease who developed RA, when compared with patients whose synovitis resolved over time. We observed a significant positive correlation between systemic 11β-HSD1 activity and ESR/CRP in patients with established RA but not in any of the early arthritis patients group. Conclusions: The present study demonstrates that patients with a new onset of synovitis whose disease subsequently resolved had significantly lower levels of systemic 11β-HSD1 activity when compared with patients whose synovitis developed into RA or other forms of persistent arthritis. Low absolute levels of 11β-HSD1 activity do not therefore appear to be a major contributor to the development of RA and it is possible that a high total body 11β-HSD1 activity during early arthritis may reduce the probability of disease resolution

    Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection

    No full text
    Existing code similarity comparison methods, whether source or binary code based, are mostly not resilient to obfuscations. In the case of software plagiarism, emerging obfuscation tech-niques have made automated detection increasingly difficult. In this paper, we propose a binary-oriented, obfuscation-resilient method based on a new concept, longest common subsequence of semantically equivalent basic blocks, which combines rigorous program semantics with longest common subsequence based fuzzy matching. We model the semantics of a basic block by a set of symbolic formulas representing the input-output relations of the block. This way, the semantics equivalence (and similarity) of two blocks can be checked by a theorem prover. We then model the semantics simi-larity of two paths using the longest common subsequence with basic blocks as elements. This novel combination has resulted in strong resiliency to code obfuscation. We have developed a prototype and our experimental results show that our method is effective and practical when applied to real-world software
    corecore